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Abstract — In  this  paper,  we  propose  a  novel  multi-task  multi¬ 
variate  (MTMV)  sparse  representation  method  for  multi-sensor 
classification,  which  takes  into  account  correlations  between 
sensors  simultaneously  while  considering  joint  sparsity  within 
each  sensor’s  observations.  This  approach  can  be  seen  as  the 
generalized  model  of  multi-task  and  multivariate  Lasso,  where 
all  the  multi-sensor  data  are  jointly  represented  by  a  sparse 
linear  combination  of  training  data.  We  further  modify  our 
MTMV  model  by  including  a  clutter  noise  term  that  is  also 
assume  to  be  sparse  in  feature  domain.  An  efficient  algorithm 
based  on  alternative  direction  method  is  proposed  for  both 
models.  Extensive  experiments  are  conducted  on  real  data  set 
and  the  results  are  compared  with  the  conventional  discriminative 
classifiers  to  verify  the  effectiveness  of  the  proposed  methods. 

I.  Introduction 

Multi-sensor  fusion  have  received  considerable  amount  of 
attentions  over  past  few  years  for  both  military  and  non¬ 
military  tasks  [1]  [2]  [3].  A  particular  interest  of  multi-sensor 
fusion  is  classification,  where  the  ultimate  question  is  how 
to  take  advantage  of  having  related  information  from  different 
sources  (sensors)  recording  the  same  physical  event  to  achieve 
improved  classification  performance.  A  variety  of  approaches 
have  been  proposed  in  the  literature  to  answer  this  question 
[4]  and  [5].  These  methods  mostly  fall  into  two  categories: 
Decision  in  -  decision  out  (DI-DO)  and  feature  in  -  feature 
out  (FI-FO)  [3].  In  [4],  the  authors  investigated  the  DI-DO 
method  on  vehicle  classification  problem  using  data  collected 
from  acoustic  and  seismic  sensors.  They  proposed  to  perform 
local  classification  for  each  sensor  signal  by  conventional 
methods  such  as  Support  Vector  Machine  (SVM).  These  local 
decisions  are  then  incorporated  via  Maximum  A  Posterior 
(MAP)  estimator  to  make  the  final  classification  decision,  thus 
named  DI-DO  method.  In  [5],  FI-FO  method  is  studied  for 
vehicle  classification  using  both  visual  and  acoustic  sensors. 
They  proposed  a  method  to  extract  temporal  gait  patterns 
from  both  sensor  signals  and  utilize  them  as  inputs  for  SVM 
classifier.  They  furthermore  compared  DI-DO  and  FI-FO  ap¬ 
proaches  on  their  dataset  and  showed  the  higher  discrimination 
performance  of  FI-FO  over  DI-DO. 

In  signal  processing,  most  natural  signals  are  inherently 
sparse  in  certain  bases  or  dictionaries  where  they  can  be  ap¬ 
proximately  represented  by  only  a  few  significant  components 
carrying  the  most  relevant  information.  In  other  words,  the 


intrinsic  signal  information  usually  lies  in  a  low-dimensional 
subspace  and  the  semantic  information  is  often  encoded  in 
the  sparse  representation.  Especially  with  the  emergence  of  the 
Compressed  Sensing  (CS)  framework  [6]  and  [7],  sparse  repre¬ 
sentation  and  related  optimization  problems  involving  sparsity 
as  a  prior  called  sparse  recovery  have  increasingly  attracted 
the  interest  of  researchers  in  various  diverse  disciplines. 

Though  the  usage  of  sparsity  has  been  successfully  em¬ 
ployed  in  inverse  problem,  where  sparsity  acts  as  a  strong  prior 
to  abbreviate  ill-posedness  of  the  problem.  Recent  research 
has  pointed  out  that  sparse  representation  is  also  useful  for 
discriminative  applications  [8]  [9]  [10].  These  applications 
rely  on  the  crucial  observation  that  it  is  possible  to  represent 
the  test  sample  as  a  linear  combination  of  training  samples 
belonging  to  the  same  class  as  the  target  and  not  to  the  other 
classes.  Thus,  if  the  dictionary  is  constructed  from  all  the 
training  samples  in  all  the  classes,  the  test  samples  can  be 
sparsely  represented  by  only  a  few  columns  of  this  dictionary. 
Therefore,  the  sparse  coefficient  vector,  which  is  recovered 
efficiently  via  -minimization,  can  naturally  be  considered 
as  the  discriminative  factor.  In  [9],  the  authors  successfully 
applied  this  idea  to  the  face  recognition  problem.  Since  then, 
many  more  complicated  techniques  have  been  exploited  and 
applied  to  various  fields  such  as  hyperspectral  target  detection 
[11],  acoustic  signal  classification  [12]  and  visual  classification 
[8]  [13]  [14].  For  instance,  the  authors  of  [12]  proposed  a  joint 
sparse  model  for  acoustic  signal  classification,  which  exploits 
the  fact  that  multiple  observations  from  the  same  class  could 
be  simultaneously  represented  by  few  columns  of  the  training 
dictionary.  Thus,  coefficient  vectors  associated  with  these 
observations  might  deliver  the  same  sparse  pattern.  Similarly, 
[14]  investigated  a  multi-task  model  for  visual  classification, 
which  also  assumes  tasks  belonging  to  the  same  class  have  the 
same  sparse  support  distributions  on  coefficient  vectors.  To 
improve  classification  performance,  all  these  models  require 
efficient  algorithms  that  take  into  account  this  precious  piece 
of  sparsity  as  a  prior  information. 

In  this  paper,  we  consider  a  nrulti-sensor  classification 
problem,  focusing  on  discriminating  between  a  human  and 
non-human  footstep  activity.  The  experimental  setup  is  as 
follows:  A  set  of  four  acoustic,  three  seismic,  one  passive 
infrared  (PIR)  and  one  ultrasonic  sensors  are  used  to  measure 
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same  physical  event  simultaneously  on  the  field.  The  ultimate 
goal  is  to  detect  whether  the  event  is  human  or  human  and 
animal  footsteps.  As  opposed  to  the  previous  approaches  on 
this  problem  in  which  only  one  single  sensor  is  utilized  to 
record  data  [15]  [16]  [17],  in  this  paper  we  propose  a  novel 
regularized  regression  method,  namely  multi-task  multivari¬ 
ate  lasso,  which  effectively  incorporates  both  multi-task  and 
multivariate  Lasso  ideas  [8]  and  [18].  This  technique  imposes 
group  sparsities  both  within  each  task  (sensor)  and  across 
the  tasks.  Furthermore,  we  extend  our  model  to  deal  with 
sparse  noise  with  arbitrarily  large  magnitude.  This  high-energy 
environmental  noise  frequently  appears  in  sensor  data  due  to 
the  unpredictable  or  uncontrollable  nature  of  the  environment 
during  the  data  collection  process.  Though  our  technique  is 
designed  for  military  purposes,  it  is  not  restricted  to  this 
specific  application.  Rather,  it  can  be  applied  to  any  set  of 
classification  or  discrimination  problems,  where  data  is  usually 
collected  from  multiple  sources. 

The  remainder  of  this  paper  is  organized  as  follows.  Section 
II  briefly  introduces  various  sparsity  models  with  the  main 
focus  on  our  proposed  multi-task  multivariate  (MTMV)  and 
MTMV  with  sparse  noise  models  in  subsections  C  and  D, 
respectively.  We  present  in  Section  III  a  fast  and  efficient 
algorithm  to  solve  convex  optimizations  for  these  models.  Ex¬ 
tensive  experiments  are  shown  in  Section  IV  and  conclusions 
are  drawn  in  Section  V. 

II.  Sparsity  models 

Consider  a  multi-task  (multi-sensor)  C-class  classification 
problem.  Suppose  we  have  a  training  set  of  p  samples  in 
which  each  sample  has  D  different  modalities  of  features  or  D 
different  tasks.  For  each  task  (sensor)  i  =  1, ...,  D,  we  denote 
X1  =  \X\, X’2,  ...,Xy  as  a  n  x  p  dictionary,  consisting  of 
C  sub-dictionaries  X^’s  with  respect  to  C  classes.  Here,  each 
sub-dictionary  Xj  =  [Xj^Xj^,  £  RnxPj'  represents 

a  set  of  training  data  from  the  ith  task  labeled  with  jth  class. 
Accordingly,  x\ '■  k,  which  we  usually  call  an  atom  in  the  dictio¬ 
nary  is  the  fcth  training  sample  for  ith  task  and  jth  class.  Notice 
that  pj  is  the  number  of  training  sample  for  class  jth  and  n 
is  the  feature  dimension  of  each  sample,  therefore,  the  total 
samples  is  p  =  X]/=i  Pj-  Given  a  test  sample  Y  comprising  of 
D  tasks  {y\y2,  ...,YD}  where  each  sample  task  Yl  consists 
of  di  observations  Y1  =  [2/1, 2/1  >  €  Knxdi,  we  want  to 

decide  which  class  the  sample  Y  belongs  to. 

A.  Sparse  representation  for  classification  (SRC) 

We  first  review  the  single  task  (single  sensor)  sparse  rep¬ 
resentation  for  classification  method.  Accordingly,  we  remove 
the  subscript  i  representing  tasks  for  simplicity.  In  this  prob¬ 
lem,  a  particular  and  effective  model  is  to  assume  that  the 
training  samples  belonging  to  the  same  class  approximately 
lie  on  a  low-dimensional  subspace.  Given  a  set  of  C  distinct 
classes  X  =  [Xi,X2,  ...,Xc\  where  jth  class  has  pj  training 
samples  {Xj:k}k=i....,pr  a  new  test  sample  y  belonging  to  the 
jth  class  will  approximately  lie  in  the  linear  space  spanned  by 


the  training  samples  associated  with  jth  class: 

y  =  Xw  +  n ,  (1) 

where  w  is  the  coefficient  vector  whose  entries  have  value 
0’s  except  those  associated  with  the  jth  class:  w  = 
[0r,  ...,0T]T,  and  n  is  a  small  noise  due  to 

the  imperfectness  of  the  test  sample. 

In  order  to  obtain  the  sparse  vector  w,  it  is  natural  to 
consider  the  following  optimization 

w  =  argmin  ||to||0  subject  to  \\y  —  Xw\\2  <  e,  (2) 

W 

where  ||to||0  is  Z0-norm  defined  as  the  number  of  non-zero 
entries  of  w  and  <  is  the  noise  energy.  This  Zo-norm  mini¬ 
mization  can  be  interpreted  as  finding  the  sparsest  solution 
obeying  the  quadratic  constraints.  However,  (2)  is  well-known 
as  an  NP-hard  problem  due  to  the  non-convexity  and  non¬ 
differentiability  of  the  Zo-norm.  Many  alternative  approaches 
have  been  proposed  to  approximately  solve  (2)  such  as  greedy 
pursuits  (Orthogonal  Matching  Pursuit  [19],  Subspace  Pursuit 
[20])  and  Iterative  Hard  Thresholding  (IHT)  [21].  Alterna¬ 
tively,  Zo -minimization  can  be  efficiently  solved  by  recasting  it 
as  Zi-based  convex  programming  problem  [6]  [7].  In  this  paper, 
we  utilize  the  -minimization  approach  which  is  described  as 
follows 

w  =  argmin  ^  \\y  -  Xw\\\  +  A  ||wj| x  ,  (3) 

W  " 

where  A  is  a  positive  regularization  and  Zi-norm  ||il?|| x  = 
\Wi\.  This  optimization  is  also  known  as  Lasso  [22], 
which  can  be  solved  efficiently  in  polynomial  time  by  standard 
convex  optimization  techniques. 

Once  the  coefficient  vector  w  is  estimated,  the  class  label 
of  y  is  determined  by  the  minimal  residue  between  y  and  its 
approximation  from  each  class  sub-dictionary 

j  =  argmin  \\y  -  XSj (w) ||2  ,  (4) 

j 

where  Sj(-)  is  a  vector  indicator  function,  defined  by  keeping 
the  coefficients  corresponding  to  the  jth  class  and  setting  all 
others  to  be  zero,  i.e.  5k(uil)  =  [0T, ...,  0r,  uij,  0T, ...,  0T]r. 

B.  Multivariate  sparse  representation  for  classification  (MV- 
SRC) 

Let  us  first  consider  a  single  task  sparse  representation 
where  the  test  sample  is  generated  by  a  single  sensor.  However, 
the  test  sample  Y  may  consists  of  multiple  observations 
of  the  same  physical  event  obtained  by  the  same  sensor: 
Y  =  [l/i, 2/2,  ••■,2/d]  €  K"xii.  In  our  problem,  each  observation 
is  one  segment  of  the  test  signal  where  each  segment  is 
obtained  by  partitioning  the  test  signal  into  d  (overlapping) 
segments.  Again,  suppose  the  test  signal  belongs  to  jth  class, 
it  can  be  often  assumed  that  each  observation  y,  is  a  linear 
combination  of  training  samples  in  the  sub-dictionary  Xj 
which  consists  of  training  segments.  That  is,  for  all  i  = 
1  ,—,d,  y,  =  Xwi  +  rii  where  X  =  [X1,X2,  ■■■,XC\  is 
a  concatenation  of  C  sub-dictionaries  and  w,'s  are  sparse 
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vectors  whose  nonzero  entries  are  associated  with  the  jth  class: 
Wi  =  [0r,  ...,0T ,wf  j. 0r,  ...,0T]T  and  n^’s  are  small  noises. 
If  we  denote  W  =  [w±,W2,  ■■■,u>d]  £  RnXd,  then  W  is  a  row- 
sparse  matrix  with  only  pj  nonzero  rows  located  at  the  j\h 
class. 

To  recover  the  row-sparse  matrix  W,  the  following  joint 
sparse  optimization  is  proposed  with  q  >  1.  This  optimization 
has  been  well  known  as  group  Lasso  [18]  [23]. 

W  =  argmin  \  || Y  -  XW\\2F  +  A  \\W ||x  ,  (5) 

where  A  is  a  positive  regularization  parameter,  and  ||W||l9 
is  a  norm  defined  as  ||PV || x  q  =  Ylk=i  ||«>fc||9  where  wk's  are 
row  vectors  of  W.  This  norm  can  be  phrased  as  performing  lq- 
norm  cross  the  columns  (observations)  and  then  Zi-norm  along 
rows.  It  is  clear  that  this  l\/lq  regularization  norm  encourages 
shared  patterns  across  related  observations,  and  thus  solution 
of  the  optimization  (5)  has  common  support  at  column  level. 

The  class  label  is  determined  by  the  following  rule 

j  =  argmin \\Y  -  XSJ(W)\\F  ,  (6) 

j 

where  Sj{-)  is  the  matrix  indication  function,  defined  by 
keeping  rows  corresponding  to  the  jth  class  and  setting  all 
others  to  be  zeros. 

C.  Multi-task  multivariate  sparse  representation  for  classifi¬ 
cation  (MTMV-SRC) 

In  the  previous  section,  we  have  employed  a  single  source 
sparse  representation  for  classification.  In  the  scenario  where 
an  event  is  captured  by  multiple  heterogeneous  sources  (sen¬ 
sors),  thus  multiple  observations  are  available  in  the  test 
sample.  By  exploiting  correlation  between  different  sources, 
we  can  potentially  improve  classification  accuracy.  To  handle 
multiple  sources,  a  naive  approach  is  to  utilize  voting  scheme 
(or  DI-DO  method),  where  for  each  sensor  the  aforementioned 
two-step  classification  algorithm  described  in  Section  II-B  is 
performed  and  a  class  label  is  assigned.  The  final  decision  is 
made  by  selecting  the  label  that  occurs  the  most.  It  is  clear 
that  this  approach  does  not  exploit  the  relationship  between 
different  sources  except  at  the  post-processing  step  where 
decision  is  made  via  fusion. 

In  this  section,  an  alternative  approach  is  proposed  in  which 
we  exploit  a  joint  sparsity  of  coefficient  vectors  from  different 
sources  in  order  to  make  a  joint  classification  decision.  To 
illustrate  this  model,  let  us  first  consider  a  two-task  classi¬ 
fication  with  the  test  sample  Y  consisting  of  two  tasks  Y 1 
and  Y2  collected  from  2  different  sensors.  Suppose  that  Y1 
belongs  to  the  j\h  class,  it  can  be  reconstructed  by  a  linear 
combination  of  the  atoms  in  the  sub-dictionary  X  j .  That  is, 
Y1  =  X1W1  +  N ;  where  W1  is  a  sparse  matrix  with  only 
Pj  nonzero  rows  associated  with  jth  class  and  N 1  is  a  small 
noise  matrix. 

Since  Y2  represents  the  same  event,  it  belongs  to  the  same 
class,  and  thus  can  be  approximated  by  training  samples  in  X2 
with  a  different  set  of  coefficients  Wj,  Y2  =  X2W2  +  N2 


where  W 2  has  the  same  sparsity  pattern  as  W 1 . 

If  we  denote  W  =  \W 1 .  W2  .  then  W  is  a  sparse  matrix 
with  only  pj  nonzero  rows.  Therefore,  in  order  to  seek  for 
this  row-sparse  matrix,  we  should  incorporate  this  common 
sparse  pattern  prior  into  the  optimization  algorithm.  In  the 
more  general  case  where  we  have  D  sources  (sensors),  if  we 
denote  {Y*}^  as  a  set  of  D  observations  each  consisting  of  d 
segments  collected  from  D  sensors  and  let  W  £  MraXp£)  be  an 
unknown  matrix  formed  by  concatenating  coefficient  matrices 
W  =  [W 1 .  W2.  ...,WD],  This  matrix  W  can  be  recovered  by 
solving  the  following  Zi /^-regularized  least  square  problem 

W  =  argmin  -  £  ||Y‘  -  X'wfp  +  A  \\W\\liq  ,  (7) 

w  i=  l 

where  A  is  a  positive  parameter  and  q  is  set  to  be  greater  than 
1  to  make  the  optimization  convex.  This  optimization  (7)  is 
called  multi-task  multivariate  Lasso. 

Once  W  is  obtained,  the  class  label  is  decided  by  minimal 
residual  rule 

D 

3margmmY,\\Yi~Xi6ij(Wi)\\2F,  (8) 

i= l 

where  5*  is  the  matrix  indication  function  associated  with  ith 
sensor,  defined  similarly  as  the  aforementioned  section. 

D.  Multi-task  multivariate  sparse  representation  with  sparse 
noise  (MTMV-SRC+N) 

During  the  process  of  collecting  data  in  the  field,  there  are 
many  environmental  clutter  noises  such  as  impulsive  noise  or 
wind  noise  affecting  the  true  characteristics  of  the  signal.  Un¬ 
fortunately,  due  to  the  imperfectness  of  the  environment,  these 
noise  sources  are  uncontrollable  and  can  have  arbitrarily  large 
magnitude,  which  sometime  dominate  the  collected  signal.  It 
is  obvious  that  by  removing  these  clutter  noises  it  is  possible  to 
improve  overall  classification  performance.  Fortunately,  these 
type  of  noises  usually  occur  in  certain  frequency  bands.  We 
expect  that  this  will  only  affect  some  coefficients  in  our 
cepstral  feature  domain,  which  is  the  feature  space  used  in  our 
experiments.  In  this  case,  the  linear  model  of  the  observation 
Yl,  i  =  \ D  with  respect  to  the  training  data  X1  should 
be  modified  as 

Yi  =XiWi  +  Ei+N\  i  =  l,...,D,  (9) 

where  N 1  is  a  small  dense  additive  noise  and  El  £  g.nXdi 
is  a  matrix  of  clutter  noise  with  arbitrarily  large  magnitude. 
The  nonzero  entries  of  E'  £  IRnxri*  represents  which  cepstral 
features  of  Yl  are  corrupted.  Note  that  it  might  be  possible 
that  all  these  cepstral  features  are  corrupted.  The  location  of 
corruption  can  differ  for  different  tasks  since  sensors  with 
different  characteristics  will  have  different  types  of  errors.  In 
a  more  general  scenario,  one  can  assume  that  each  error  E'  is 
sparsely  represented  with  respect  to  some  basis  Tl  £  MIlXm  . 
That  is,  El  =  TlZl  for  some  sparse  matrices  Zl  £  MmiXdi. 

The  idea  of  exploiting  sparse  prior  of  the  error  has  been  de¬ 
veloped  by  Wright  et.  al.  [9]  in  the  context  of  face  recognition. 
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and  Candes  et.  al.  [24]  in  robust  principle  component  analysis. 
In  this  section,  we  propose  a  new  sparse  representation  method 
that  simultaneously  performs  classification  and  removes  clutter 
noise.  By  taking  the  advantage  of  knowing  that  errors  E 1 
are  sparse,  we  propose  to  solve  the  following  optimization 
to  retrieve  coefficients  W'  as  well  as  errors  E1 


(W ,E)  =  argmin  \  V  ||y*  -  X'W'  -  E’  ||  J, 

we 

+  ||W||1>9  +  Ae  H^llj ,  (10) 

where  Xw  and  Ae  are  positive  parameters,  and  E  is  a  matrix 
formed  by  error  matrices  E1’ s:  E  =  [E1  ,E2 , ... ,ED ].  The  l\- 
norm  of  matrix  E  is  defined  as  the  sum  of  absolute  value  of  the 
entries:  =  JT .  |e,j|.  It  is  clear  from  this  minimization 

that  we  impose  both  common  sparsity  on  W  and  entry-wise 
sparsity  on  the  error  E. 

Once  the  sparse  solution  W  and  error  E  are  computed, 
the  clean  cepstral  features  Y’r  is  recovered  by  setting  Y)  = 
Yx  —  E' .  To  identify  the  class,  we  slightly  modify  the  label 
inference  in  (8)  that  accounts  for  the  error  El 

D 

3  =  argmin^  ||F?;  -  X'S'.'W' ■  -  El\\2p  .  (11) 

•1  i= 1 

III.  Algorithm 

In  this  section,  we  propose  a  fast  algorithm  to  solve  (10). 
The  optimization  (7)  can  be  solved  similarly  by  setting  the 
parameter  Ae  in  (10)  with  respect  to  error  regularization 
to  zero.  Our  algorithm  is  relied  on  the  classical  alternating 
direction  method  of  multipliers  (ADMM).  This  method  has 
been  recently  applied  successfully  into  1 1  -norm  minimization 
[25]  [26], 

Denote  the  loss  function  C(W,E)  = 

|  J2iLi  H^1  —  X’W1  —  El\\22,  our  ultimate  goal  is  to 
solve  the  following  optimization 

mmjC(W,E)  +  Xe\\E\\1  +  Xw\\W\\1:q.  (12) 


One  of  the  key  ideas  of  the  algorithm  is  to  decouple 
C{W,E),  ll-EHj  and  ||W||lg.  This  can  be  performed  by 
introducing  auxiliary  variables  to  reformulate  the  problem  into 
a  constrained  optimization 


wrmin^/:(W’,£)+Ae||^||1+A(0||V'||li, 

subject  to  W  =  V ,E  =  U . 


(13) 


The  reason  behind  variable  splitting  method  is  that  it  might 
be  easier  to  solve  the  constrained  problem  (13)  than  its 
unconstrained  counterpart  (12).  Since  (13)  is  an  equality  con¬ 
strained  problem,  the  Augmented  Lagrangian  method  (ALM) 
can  be  used  to  solve  by  minimizing  the  augmented  Lagrangian 
function  fpEipw(W,E,V,U:BE,Bw)  defined  as 


C(W,E)  +  Ae  IICTH,  +  (Be,E-  U)  +  If  || E  -  U\\l 
+  Xw  \\V\\hq  +  (Bw,W  —  V)  +  13^-  \\W  -  VfF  ,  (14) 


where  Be  and  Bw  are  the  multipliers  of  the  two  linear 
constraints,  and  /3g ,  flw  are  the  positive  penalty  parameters. 
The  ALM  consists  in  solving  fpEipw(W,E,V,U,BE,Bw) 
with  respect  to  W.  E.  U  and  V  jointly,  keeping  Be  and 
B\v  fixed,  and  then  updating  Be  and  B\v-  However,  this 
minimization  is  often  not  easy.  Fortunately,  by  considering 
the  separable  structure  of  the  objective  function  fpE,pw,  we 
can  further  simplify  the  problem  by  minimizing  fpEypw  with 
respect  to  variables  W,  E,  U  and  V  separately.  The  method 
is  called  alternating  direction  method  of  multipliers  (ADMM) 
and  given  below 


1 

Choose  W°,  U°,  V°,  B%  B°v  and  pE,  pw 

2 

While  not  converged  do 

3 

Wt+ 1  =  argmin^  f(yV,EuUuVt-,Bu>t,BVtt) 

4 

Et+i  =  axgnnnE  f{Wt+i,E,Ut,Vt-,Bu,t,BVtt) 

5 

Ut+i  =  argmin^  f(Wt+i,Et+1,U,Vt-,Bu>t,Bv,t) 

6 

Vt+i  =  argminv  f(Wt+i,Et+i,U t+i,  V ;  BUtt,  BVj) 

7 

Bu,t+i  '■=  Bu,t  +  Pe(W t+ 1  -  Ut+i) 

8 

By,t+ 1  :=  Bv,t  +  Pw{Wt+ 1  -  Vt+i)- 

Algorithm  1:  HMT  representation  via  ADMM 


The  first  optimization  subproblem  with  respect  to  W  (line 

3  of  the  Algorithm  1)  has  quadratic  structure,  thus,  easy  to 
solve  via  setting  the  first-order  derivative  to  zero.  Furthermore, 
since  the  loss  function  £(W,E )  is  a  sum  of  convex  functions 
associated  with  sub-matrices  Wl ,  one  can  simultaneously  seek 
for  Wlt+1  which  has  explicit  solution  as  follows 

Wi+1  =  (X^X'  +  faQ-'lX^pr-Eti+PwVi-Biy't], 

(15) 

where  I  is  an  identity  matrix  of  size  p  x  p,  and 
U\,B\j  t,Vlt,Bxvt  are  sub-matrices  of  Ut.  Bjjj,  Vt  and  By,t, 
respectively. 

The  second  optimization  subproblem  with  respect  to  E  (line 

4  of  the  Algorithm  1)  has  similar  structure  and  can  be  solved 
by 

E\+i  =  (1  +  PE)-1(Yi  -  XxW\+l  +  pEU\  -  B'ej).  (16) 


The  third  optimization  subproblem  with  respect  to  U  (line 
5  of  the  Algorithm  1)  is  the  standard  l\ -minimization.  In  fact, 
by  a  simple  calculus,  this  optimization  is  equivalent  to 

2  +^l|tf|li,  (17) 

F  PE 


1 

mm  - 

u  2 


Et+i  +  -E-BE,t  ~  U 
Pe 


which  is  well-known  shrinkage  problem.  The  explicit  solution 
is  derived  by 


Ut+i  —  Shrink(St+i  + 


Pe 


BE,t,  A e/pE ), 


(18) 


where  the  operator  Shrink(-)  performs  entry-wise  and  is  de¬ 
fined  by  Shrink(2,e)  =  sgn(z)(|z|  —  e)  for  \z\  >  e  and  zero 
otherwise. 

The  last  optimization  subproblem  with  respect  to  V  (line  6 
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of  the  Algorithm  1)  can  be  recast  as 


1 

mm  - 

v  2 


Wt- 1-1  +  -T - Bw,t  —  V 

Pw 


(19) 


It  is  cleai'  that  the  objective  function  in  (19)  has  separable 
structure.  Therefore,  this  optimization  can  be  tackled  by  mini¬ 
mizing  with  respect  to  each  row  of  V  separately.  Specifically, 
if  we  denote  bw,i,t  and  Vi,t+ 1  as  rows  of  matrices 

W t+i,B\v,t  and  V  t+i,  respectively,  then  for  each  i  =  1,  ...,p 
we  solve  a  sub-problem, 


1 

Vi,t+i  =  argmm  - 

V  " 


z  -  v 


l?,+7l 


(20) 


where  2  :=  wijt+1  -  bw^t/Pw  and  7  :=  \W/Pw- 
Though  (20)  is  sufficiently  simple  to  derive  an  explicit  solution 
for  any  value  of  q.  In  this  paper  we  only  consider  the  simplest 
situation  q  =  2.  It  is  now  easy  to  check  that  solution  of  (20) 
has  explicit  form 


vitt+ 1 


lli) 


2 


where  (u)+  is  a  vector  with  entries  max(u;,0). 


IV.  Simulation  results  and  analysis 

In  this  section,  we  perform  extensive  experiments  on  real 
multi-sensor  data  sets  and  compare  the  results  with  several 
conventional  classification  methods  such  as  logistic  regression 
and  SVM  to  verify  the  effectiveness  of  our  proposed  approach. 


A.  Experiment  setup 

In  this  part,  we  explain  briefly  the  experiment  setup. 

1.  Data  collection.  Footstep  data  collection  was  conducted 
by  two  sets  of  nine  sensors  consisting  of  four  acoustic,  three 
seismic,  one  passive  infrared  (PIR)  and  one  ultrasonic  sensors 
in  two  days  (see  Fig.  1  for  3  different  types  of  sensors).  Tests 
with  human  footstep  include  one  person  walking,  one  person 
jogging,  two  people  walking,  two  people  running,  and  a  group 
of  multiple  people  walking,  running.  Tests  with  human-animal 
footstep  include  one  person  leading  horse  or  dog,  two  people 
leading  a  horse  and  a  mule,  three  people  leading  a  horse,  a 
mule  and  a  donkey  and  a  group  of  multiple  people  with  several 
dogs.  In  each  test,  people  and  animal  could  be  carrying  varying 
amount  of  loads  such  as  backpack,  metal  pipe.  People  in  the 
test  comprise  of  both  males  and  females. 

During  each  run,  test  subjects  are  asked  to  follow  a  path 
where  two  sets  of  sensors  are  positioned  and  return  to  the 
start  point.  The  two  sensor  sets  are  placed  separately  in  which 
each  set  consists  of  all  nine  sensors.  A  total  of  69  round- 
trip  runs  were  conducted  in  two  days,  including  34  runs  for 
human  footstep  and  35  runs  for  human-animal  footsteps.  The 
collected  data,  named  DEC09  and  DEC  10  corresponding  to 
two  days  December  09  and  10,  is  presented  in  Table  IV-A. 

2.  Segmentation.  To  accurately  perform  classification,  it 
is  necessary  to  extract  the  actual  events  from  the  run  series. 
Although  the  raw  signal  of  each  run  might  be  several  minutes 
in  length,  the  event  is  much  shorter  as  it  occurs  in  a  short 


TABLE  I 

Total  amount  of  data  collected  in  two  days. 


Data 

Human 

Human  lead  animal 

DEC09 

16 

15 

DEC  10 

18 

20 

Fig.  1.  Four  acoustic  sensors  (top  left),  seismic  sensor  (top  right)  and  passive 
infrared  (PIR)  sensor  (bottom). 


period  of  time  when  the  test  subject  is  close  to  the  sensors. 
In  addition,  event  can  be  at  arbitrary  locations.  To  extract 
useful  features,  we  need  to  detect  time  locations  where  the 
physical  event  occurs.  To  do  this,  we  identify  the  location 
with  strongest  signal  response  by  spectral  maximum  detection 
method  [27].  From  this  location,  10  segments  with  75% 
overlap  on  both  sides  of  the  signals  are  taken,  each  has  30000 
samples  corresponding  to  3  seconds  signal.  This  process  is 
performed  for  all  sensor  data.  Overall,  for  each  run,  we  have 

9  signals  captured  by  9  sensors,  each  signal  is  divided  into 

10  overlapping  segments,  thus  D  =  9  and  each  di  =  10, 
i  =  1,  ...,£>. 

Fig.  2  visually  demonstrates  signals  captured  by  four  distinct 
sensors  where  the  event  is  one  person  walking.  As  one  can  see, 
different  sensors  characterize  different  signal  behaviors.  The 
seismic  signal  shows  more  clearly  the  cadences  of  the  test 
person  while  we  are  not  able  to  observe  this  event  by  other 
sensing  signals.  To  have  a  closer  look  at  the  sensing  signals, 
we  further  show  in  Fig.  3  one  segment  extracted  from  each  of 
the  9  distinct  sensing  signals.  In  this  figure,  the  forth  acoustic 
signal  is  corrupted  due  to  the  sensor  failure  during  collection 
process. 

3.  Feature  extraction.  After  segmentation,  we  extract 
Cepstral  features  [28]  in  each  segment  and  keep  the  first 
500  coefficients  for  classification.  Cepstral  features  have  been 
proved  to  be  effective  in  speech  recognition  and  acoustic  signal 
classification.  The  feature  dimension  which  is  represented  by 
the  number  of  extracted  cepstral  features  is  n  =  500. 

B.  Two  class  problem 

First,  we  demonstrate  the  effectiveness  of  exploiting  correla¬ 
tions  between  multiple  sensors  over  the  use  of  a  single  sensor 
on  two-class  classification  problem,  in  particular  classifying 
human  and  human-animal  footsteps.  In  this  experiment,  we 
use  the  DEC  10  data  for  training  and  DEC09  data  for  testing, 
which  leads  to  36  training  and  31  testing  samples.  For  each  ith 
sensor,  the  corresponding  training  dictionary  X'  is  constructed 
from  all  the  cepstral  feature  segments  extracted  from  the  36 
training  signals.  In  our  experiments,  10  segments  are  taken 
from  each  individual  sensor  signal.  Therefore,  each  training 
dictionaries  X %  i  =  1, ...,  9  is  of  size  500  x  360  and  the 
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0.1 


Acoustic  sensor 


First  accoustic  sensor 


Fig.  2.  Signals  captured  by  four  sensors  including  acoustic,  seismic,  PIR 
and  ultrasonic  sensors. 


associated  observations  Yl  is  of  size  500  x  10  where  500  is 
the  feature  dimension. 

Classification  performance  is  summarized  in  Table  II,  where 
the  first  column  refers  to  the  methods  used  in  our  experiments 
which  include  multivariate  sparse  representation  for  classifi¬ 
cation  (MV-SRC)  for  all  9  sensors  separately  and  multi-task 
MV-SRC  (MTMV-SRC)  for  different  combinations  of  sensors. 
We  note  here  that  the  first  four  sensors  are  acoustic,  the  next 
three  (sensors  5  to  7)  are  seismic  sensors  and  the  last  two 
are  PIR  and  ultrasonic  sensors,  respectively.  The  second  and 
third  columns  describe  classification  accuracy  of  human  and 
human-animal  footsteps,  and  the  last  column  is  the  overall 
accuracy.  As  can  be  seen  for  the  table  II,  MTMV-SRC  using 
the  first  two  acoustic  sensors  simultaneously  outperforms  MV- 
SRC  when  using  each  sensor  separately.  Similar  behavior  can 
be  observed  with  three  seismic  sensors.  When  all  9  sensors 
are  employed,  MTMV-SRC  yields  the  best  performance. 

It  was  noticed  that  during  experimentation  that  half  of  the 
testing  data  collected  from  two  acoustic  sensors  3  and  4  in 
DEC09  is  completely  noisy  due  to  the  malfunction  of  these 
two  sensors  in  December  09  (see  Fig.  3  for  demonstration 
of  a  noisy  segment  extracted  from  4th  acoustic  sensor).  This 
explains  why  classification  performance  of  the  two  sensors  are 
quite  low  compared  to  sensors  1  and  2. 

The  next  experiment  compares  our  proposed  approach  with 
current  state-of-the-art  classification  methods  such  as  (group) 
sparse  logistic  regression  (SLR)  [29],  kernel  (group)  SLR 

[30] ,  linear  support  vector  machine  (SVM)  and  kernel  SVM 

[31] .  In  this  experiment,  7  sensors,  namely  sensors  1,2  5- 
9  are  utilized.  To  exploit  correlation  across  tasks  (sensors)  in 
logistic  regression  method,  we  utilize  the  heterogeneous  model 
proposed  in  [30].  The  main  idea  is  to  associate  each  training 
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Third  accoustic  sensor 
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Forth  accoustic  sensor 


First  seismic  sensor 


Fig.  3.  Signal  segments  of  length  30000  captured  by  sets  of  9  sensors 
including  acoustic,  seismic,  PIR  and  ultrasonic  sensors. 


dictionary  X1  6  R"Xp  with  a  coefficient  vector  wl  £  Rn 
and  the  logistic  loss  is  taken  over  the  sum  of  all  the  tasks. 
Lasso  or  group  Lasso  regularization  can  be  incorporated  into 
the  optimization  to  retrieve  coefficient  vectors  w‘,  i  =  1, ...,  7. 
Each  segment  of  the  test  sample  is  then  assigned  to  a  class 
and  the  final  decision  is  made  by  selecting  the  label  that 
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TABLE  II 

Classification  accuracy  (%)  for  two  class  problem  with 
TRAINING  SAMPLES  TAKEN  FROM  DEC  10,  COMPARING  MV-SRC  TO 
MTMV-SRC  WITH  DIFFERENT  COMBINATIONS  OF  SENSORS. 


Methods 

H 

HA 

OA 

MV-SRC  sensor  1 

75.00 

46.67 

60.84 

MV-SRC  sensor  2 

75.00 

53.33 

64.17 

MV-SRC  sensor  3 

37.50 

73.33 

55.42 

MV-SRC  sensor  4 

75.00 

33.33 

54.17 

MV-SRC  sensor  5 

50.00 

66.67 

58.33 

MV-SRC  sensor  6 

56.25 

66.67 

61.46 

MV-SRC  sensor  7 

56.25 

66.67 

61.46 

MV-SRC  sensor  8 

31.25 

73.33 

52.29 

MV-SRC  sensor  9 

68.75 

53.33 

61.04 

MTMV-SRC  sensors  1-2 

81.25 

60.00 

70.63 

MTMV-SRC  sensors  5-7 

75.00 

66.67 

70.84 

MTMV-SRC  sensors  1, 2,5-9 

81.25 

66.67 

73.96 

TABLE  III 

Classification  accuracy  (%)  for  two  class  problem  with 
TRAINING  SAMPLES  TAKEN  FROM  DEC  10,  COMPARING  MTMV-SRC 
WITH  CONVENTIONAL  CLASSIFIERS.  7  SENSORS  1,  2  AND  5-9  ARE  USED 
IN  THIS  EXPERIMENT. 


Methods 

H 

HA 

OA 

MTMV-SRC 

81.25 

66.67 

73.96 

SLR 

81.25 

53.33 

67.29 

Group  SLR 

75.00 

66.67 

70.84 

Kernel  SLR 

87.50 

46.67 

67.09 

Kernel  group  SLR 

87.50 

53.33 

70.42 

SVM 

81.25 

53.33 

67.29 

Kernel  SVM 

81.25 

60.00 

70.63 

occurs  the  most.  For  SVM,  multiple  tasks  are  incorporated 
by  concatenating  all  7  training  dictionaries  to  form  a  large 
dictionary  X  £  M7raxp.  SVM  algorithm  is  then  performed  on 
X  and  voting  scheme  is  employed  to  assign  class  label.  For  the 
kernel  versions,  we  use  RBF  kernel  with  bandwidth  selected 
via  cross  validation.  As  one  can  observe  from  Table  III,  our 
approach  outperforms  all  conventional  classifiers. 

To  further  show  the  efficiency  of  our  approach,  we  repeat  the 
same  experiments  using  DEC09  data  for  training  and  DEC  10 
data  for  testing.  The  classification  performances  are  provided 
in  Table  IV  and  V.  In  these  experiments,  we  exclude  3th 
and  4th  sensors  due  to  the  sensor  malfunction  in  December 
09.  It  can  be  seen  from  Table  IV  that  incorporating  all  7 
sensors  yields  the  best  classification  accuracy.  Table  V  com¬ 
pares  our  approach  to  traditional  classifiers.  One  can  observe 
that  MTMV-SRC  is  comparable  to  kernel  SVM  and  kernel 
group  SLR.  However,  we  will  experimentally  show  in  the 
next  section  that  by  taking  clutter  noise  into  account,  our 
MTMV-SRC  +  N  algorithm  considerably  outperforms  all  other 
methods. 

C.  Deal  with  arbitrarily  large  error 

This  section  shows  the  significance  of  imposing  additional 
l\ -regularization  term  into  MTMV-SRC  model.  This  regular¬ 


TABLE  IV 

Classification  accuracy  (%)  for  two  class  problem  with 
TRAINING  SAMPLES  TAKEN  FROM  DEC09,  COMPARING  MV-SRC  TO 
MTMV-SRC  WITH  DIFFERENT  COMBINATIONS  OF  SENSORS. 


Methods 

H 

HA 

OA 

MV-SRC  sensor  1 

66.67 

40.00 

53.34 

MV-SRC  sensor  2 

88.89 

35.00 

61.95 

MV-SRC  sensor  5 

50.00 

85.00 

67.50 

MV-SRC  sensor  6 

66.67 

65.00 

65.84 

MV-SRC  sensor  7 

27.78 

70.00 

48.89 

MV-SRC  sensor  8 

88.89 

25.00 

56.95 

MV-SRC  sensor  9 

66.67 

20.00 

43.34 

MTMV-SRC  sensors  1-2 

94.44 

35.00 

64.72 

MTMV-SRC  sensors  5-7 

38.89 

95.00 

66.95 

MTMV-SRC  sensors  1, 2,5-9 

77.78 

65.00 

71.39 

TABLE  V 

Classification  accuracy  (%)  for  two  class  problem  with 
TRAINING  SAMPLES  TAKEN  FROM  DEC09,  COMPARING  MTMV-SRC 
WITH  CONVENTIONAL  CLASSIFIERS.  7  SENSORS  1,  2  AND  5-9  ARE  USED 
IN  THIS  EXPERIMENT. 


Methods 

H 

HA 

OA 

MTMV-SRC 

77.78 

65.00 

71.39 

SLR 

55.56 

80.00 

67.78 

Group  SLR 

55.56 

85.00 

70.28 

Kernel  SLR 

88.89 

50.00 

69.45 

Kernel  group  SLR 

88.89 

55.00 

71.95 

SVM 

77.78 

60.00 

68.89 

Kernel  SVM 

77.78 

65.00 

71.39 

ization  is  expected  to  compensate  for  the  unwanted  clutter 
noises  in  the  signals  during  the  data  collection.  We  conduct 
similar  two-class  classification  experiments  with  training  sam¬ 
ples  taken  from  DEC  10  data  and  report  results  in  Table  VI. 
In  addition,  Table  VII  shows  results  with  training  samples 
taken  from  DEC09  data.  It  can  be  seen  from  the  tables  that  by 
encouraging  sparsity  for  the  noise  term,  we  can  considerably 
improve  classification  accuracy  for  both  classes  as  well  as 
overall  performance. 

V.  Conclusion  and  discussion 

In  this  paper,  we  propose  a  novel  multi-task  multivariate 
joint  structured  sparsity-based  classification  method  (MTMV- 
SRC)  for  personnel  footstep  recognition,  where  the  data  is 
collected  from  nine  sensors  including  acoustic,  seismic,  PIR 
and  ultrasonic  sensors.  Our  proposed  approach  shows  how  to 
efficiently  exploit  correlations  between  sensors  measuring  the 
same  physical  events.  Simulation  results  demonstrate  that  our 
method  yields  highly  accurate  classification  performance  and 
outperforms  many  classical  classifiers  such  as  (group)  sparse 
logistic  regression  (SLR),  support  vector  machine  (SVM)  and 
their  kernel  versions.  Furthermore,  we  extend  our  model  to 
deal  with  large  clutter  noise,  which  is  indispensable  in  many 
practical  scenarios.  We  experimentally  illustrate  the  impor¬ 
tance  of  enforcing  another  sparse  noise  regularization  into 
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TABLE  VI 

Classification  accuracy  (%)  for  two  class  problem  with 

TRAINING  SAMPLES  TAKEN  FROM  DEC  10,  SHOWING  THE  SUPERIORITY 
OF  TAKING  SPARSE  NOISE  INTO  ACCOUNT  FOR  BOTH  MV-SRC  AND 
MTMV-SRC  METHODS. 


Methods 

H 

HA 

OA 

MV-SRC+N  sensor  1 

81.25 

53.33 

67.29 

MV-SRC+N  sensor  2 

75.00 

60.00 

67.50 

MV-SRC+N  sensor  3 

68.75 

46.67 

57.71 

MV-SRC+N  sensor  4 

93.75 

6.67 

50.21 

MV-SRC+N  sensor  5 

68.75 

40.00 

53.38 

MV-SRC+N  sensor  6 

56.25 

73.33 

64.79 

MV-SRC+N  sensor  7 

68.75 

66.67 

67.71 

MV-SRC+N  sensor  8 

37.75 

73.33 

55.54 

MV-SRC+N  sensor  9 

68.75 

53.33 

61.04 

MTMV-SRC+N  sensors  1-2 

81.25 

66.67 

73.96 

MTMV-SRC+N  sensors  1-4 

56.25 

86.67 

71.46 

MTMV-SRC+N  sensors  5-7 

56.25 

86.67 

71.46 

MTMV-SRC+N  sensors  1,2, 5-9 

81.25 

73.33 

77.29 

TABLE  VII 

Classification  accuracy  (%)  for  two  class  problem  with 

TRAINING  SAMPLES  TAKEN  FROM  DEC09,  SHOWING  THE  SUPERIORITY 
OF  TAKING  SPARSE  NOISE  INTO  ACCOUNT  FOR  BOTH  MV-SRC  AND 
MTMV-SRC  METHODS. 


Methods 

H 

HA 

OA 

MV-SRC+N  sensor  1 

83.33 

35.00 

59.17 

MV-SRC+N  sensor  2 

88.89 

40.00 

64.45 

MV-SRC+N  sensor  5 

50.00 

85.00 

67.50 

MV-SRC+N  sensor  6 

61.11 

65.00 

63.06 

MV-SRC+N  sensor  7 

61.11 

45.00 

53.06 

MV-SRC+N  sensor  8 

83.33 

25.00 

54.17 

MV-SRC+N  sensor  9 

61.11 

40.00 

50.56 

MTMV-SRC+N  sensors  1-2 

83.33 

55.00 

69.17 

MTMV-SRC+N  sensors  5-7 

61.11 

80.00 

70.56 

MTMV-SRC+N  sensors  1,2, 5-9 

72.22 

80.00 

76.11 

MTMV-SRC  to  remove  arbitrarily  large  noise.  This  model  sig¬ 
nificantly  improves  the  overall  classification  accuracy.  Lastly, 
we  propose  a  first-order  fast  algorithm  based  on  classical 
alternative  direction  method  to  solve  aforementioned  models. 
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